Search CORE

62 research outputs found

Robust, fuzzy, and parsimonious clustering based on mixtures of Factor Analyzers

Author: García Escudero Luis Ángel
Greselin Francesca
Mayo Iscar Agustín
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

A clustering algorithm that combines the advantages of fuzzy clustering and robust statistical estimators is presented. It is based on mixtures of Factor Analyzers, endowed by the joint usage of trimming and the constrained estimation of scatter matrices, in a modified maximum likelihood approach. The algorithm generates a set of membership values, that are used to fuzzy partition the data set and to contribute to the robust estimates of the mixture parameters. The adoption of clusters modeled by Gaussian Factor Analysis allows for dimension reduction and for discovering local linear structures in the data. The new methodology has been shown to be resistant to different types of contamination, by applying it on artificial data. A brief discussion on the tuning parameters, such as the trimming level, the fuzzifier parameter, the number of clusters and the value of the scatter matrices constraint, has been developed, also with the help of some heuristic tools for their choice. Finally, a real data set has been analyzed, to show how intermediate membership values are estimated for observations lying at cluster overlap, while cluster cores are composed by observations that are assigned to a cluster in a crisp way.Ministerio de Economía y Competitividad grant MTM2017-86061-C2-1-P, y Consejería de Educación de la Junta de Castilla y León and FEDER grantVA005P17 y VA002G1

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Documental de la Universidad de Valladolid

Robust constrained fuzzy clustering

Author: Fritz Heinrich
García Escudero Luis Ángel
Mayo Iscar Agustín
Publication venue
Publication date: 01/01/2013
Field of study

It is well-known that outliers and noisy data can be very harmful when applying clustering methods. Several fuzzy clustering methods which are able to handle the presence of noise have been proposed. In this work, we propose a robust clustering approach called F-TCLUST based on an “impartial” (i.e., self-determined by data) trimming. The proposed approach considers an eigenvalue ratio constraint that makes it a mathematically well-defined problem and serves to control the allowed differences among cluster scatters. A computationally feasible algorithm is proposed for its practical implementation. Some guidelines about how to choose the parameters controlling the performance of the fuzzy clustering procedure are also given.Estadística e I

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Documental de la Universidad de Valladolid

A fast algorithm for robust constrained clustering

Author: Fritz Heinrich
García Escudero Luis Ángel
Mayo Iscar Agustín
Publication venue
Publication date: 01/01/2013
Field of study

The application of “concentration” steps is the main principle behind Forgy’s k-means algorithm and Rousseeuw and van Driessen’s fast-MCD algorithm. Despite this coincidence, it is not completely straightforward to combine both algorithms for developing a clustering method which is not severely affected by few outlying observations and being able to cope with non spherical clusters. A sensible way of combining them relies on controlling the relative cluster scatters through constrained concentration steps. With this idea in mind, a new algorithm for the TCLUST robust clustering procedure is proposed which implements such constrained concentration steps in a computationally efficient fashion.Estadística e I

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Documental de la Universidad de Valladolid

Constrained parsimonious model-based clustering

Author: García Escudero Luis Ángel
Mayo Iscar Agustín
Riani Marco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Producción CientíficaA new methodology for constrained parsimonious model-based clustering is introduced, where some tuning parameter allows to control the strength of these constraints. The methodology includes the 14 parsimonious models that are often applied in model-based clustering when assuming normal components as limit cases. This is done in a natural way by filling the gap among models and providing a smooth transition among them. The methodology provides mathematically well-defined problems and is also useful to prevent us from obtaining spurious solutions. Novel information criteria are proposed to help the user in choosing parameters. The interest of the proposed methodology is illustrated through simulation studies and a real-data application on COVID data.Ministerio de Economía y Competitividad (grant MTM2017-86061-C2-1-P)Junta de Castilla y León - FEDER (grants VA005P17 and VA002G18)CRoNoS COST y el proyecto “Estadísticas para la detección de fraudes, con aplicaciones para datos comerciales y estados financieros ”de la Universidad de Parma (grant IC1408)Publicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL

Repositorio Documental de la Universidad de Valladolid

tclust: An R Package for a Trimming Approach to Cluster Analysis

Author: Fritz Heinrich
García-Escudero Luis A.
Mayo-Iscar Agustín
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 17/05/2012
Field of study

Outlying data can heavily influence standard clustering methods. At the same time, clustering principles can be useful when robustifying statistical procedures. These two reasons motivate the development of feasible robust model-based clustering approaches. With this in mind, an R package for performing non-hierarchical robust clustering, called tclust, is presented here. Instead of trying to “fit” noisy data, a proportion α of the most outlying observations is trimmed. The tclust package efficiently handles different cluster scatter constraints. Graphical exploratory tools are also provided to help the user make sensible choices for the trimming proportion as well as the number of clusters to search for

Journal of Statistical Software

Fuzzy Clustering Throug Robust Factor Analyzers

Author: García Escudero Luis Ángel
Greselin Francesca
Mayo Iscar Agustín
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Producción CientíficaIn fuzzy clustering, data elements can belong to more than one cluster , and membership levels are associated with each element, to indicate the strength of the association between that data element and a particular cluster. Unfortunately, fuzzy clustering is not robust, while in real applications the data is contaminated by outliers and noise, and the assumed underlying Gaussian distributions could be unrealistic. Here we propose a robust fuzzy estimator for clustering through Factor Analyzers, by introducing the joint usage of trimming and of constrained estimation of noise matrices in the classic Maximum Likelihood approach

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Documental de la Universidad de Valladolid

Graphical and computational tools to guide parameter choice for the cluster weighted robust model

Author: Cappozzo Andrea
García-Escudero Luis Angel
Greselin Francesca
Mayo-Iscar Agustín
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2022
Field of study

The Cluster Weighted Robust Model (CWRM) is a recently introduced methodology to robustly estimate mixtures of regressions with random covariates. The CWRM allows users to flexibly perform regression clustering, safeguarding it against data contamination and spurious solutions. Nonetheless, the resulting solution depends on the chosen number of components in the mixture, the percentage of impartial trimming, the degree of heteroscedasticity of the errors around the regression lines and of the clusters in the explanatory variables. Therefore an appropriate model selection is crucially required. Such a complex modeling task may generate several “legitimate” solutions: each one derived from a distinct hyper-parameters specification. The present paper introduces a two step-monitoring procedure to help users effectively explore such a vast model space. The first phase uncovers the most appropriate percentages of trimming, whilst the second phase explores the whole set of solutions, conditioning on the outcome derived from the previous step. The final output singles out a set of “top” solutions, whose optimality, stability and validity is assessed. Novel graphical and computational tools - specifically tailored for the CWRM framework - will help the user make an educated choice among the optimal solutions. Three examples on real datasets showcase our proposal in action. Supplementary files for this article are available online

Archivio istituzionale della ricerca - Politecnico di Milano

A Fuzzy Approach to Robust Clusterwise Regression

Author: Dotto Francesco
Farcomeni Alessio
García Escudero Luis Ángel
Mayo Iscar Agustín
Publication venue
Publication date: 01/01/2016
Field of study

new robust fuzzy linear clustering method is proposed. We estimate coe cients of a linear regression model in each unknown cluster. Our method aims to achieve robustness by trimming a xed proportion of observations. Assignments to clusters are fuzzy: observations contribute to estimates in more than one single cluster. We describe general criteria for tuning the method. The proposed method seems to be robust with respect to di erent types of contamination

Repositorio Documental de la Universidad de Valladolid

Finding the Number of Groups in Model-Based Clustering via Constrained Likelihoods

Author: Cerioli Andrea
García Escudero Luis Ángel
Mayo Iscar Agustín
Riani Marco
Publication venue
Publication date: 01/01/2016
Field of study

Deciding the number of clusters k is one of the most difficult problems in Cluster Analysis. For this purpose, complexity-penalized likelihood approaches have been introduced in model-based clustering, such as the well known BIC and ICL criteria. However, the classification/mixture likelihoods considered in these approaches are unbounded without any constraint on the cluster scatter matrices. Constraints also prevent traditional EM and CEM algorithms from being trapped in (spurious) local maxima. Controlling the maximal ratio between the eigenvalues of the scatter matrices to be smaller than a fixed constant c ≥ 1 is a sensible idea for setting such constraints. A new penalized likelihood criterion which takes into account the higher model complexity that a higher value of c entails, is proposed. Based on this criterion, a novel and fully automatized procedure, leading to a small ranked list of optimal (k; c) couples is provided. Its performance is assessed both in empirical examples and through a simulation study as a function of cluster overlap

Repositorio Documental de la Universidad de Valladolid

A Reweighting Approach to Robust Clustering

Author: Dotto Francesco
Farcomeni Alessio
García Escudero Luis Ángel
Mayo Iscar Agustín
Publication venue
Publication date: 01/01/2016
Field of study

An iteratively reweighted approach for robust clustering is presented in this work. The method is initialized with a very robust clustering partition based on an high trimming level. The initial partition is then refined to reduce the number of wrongly discarded observations and substantially increase efficiency. Simulation studies and real data examples indicate that the final clustering solution is both robust and efficient, and naturally adapts to the true underlying contamination level

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Documental de la Universidad de Valladolid

ART

Archivio della ricerca- Università di Roma La Sapienza